Background: Acute lymphoblastic leukemia (ALL) is the most common malignancy in children. Diagnosis of ALL relies on subjective morphologic criteria by analyzing Giemsa-stained bone marrow (BM) smears under a microscope. Such examination, however, does not provide information about the ALL subtype and molecular profile, which is essential for diagnosis and guidance of treatment. To this end, clinicians resort to advanced techniques, such as immunophenotyping and genomic assays. Unlike Giemsa staining, these additional assays are expensive, time consuming, require special labs and trained clinicians, and are inaccessible in countries with limited resources.

Convolutional neural networks (CNNs) are machine learning methods, which currently provide state-of-the-art performance for image analysis. It has been recently shown that CNNs can reveal information from the tumor morphology that is unseen by the human eye, such as molecular biomarker expression from hematoxylin and eosin-stained histological images. Nevertheless, it is yet unclear if BM smears, which lack tissue architecture and structured layout of cells, could also provide such information. We sought to evaluate whether image analysis of Giemsa-stained BM samples by CNNs could predict B and T ALL subtypes, ETV6-RUNX1 translocation, and initial risk group stratification.

Methods: We collected a total of 276 Giemsa-stained BM slides from 163 pediatric patients diagnosed with ALL (n = 69), acute myeloid leukemia (AML) (n = 35), and non-leukemic BM (n = 59), between 2009 and 2022, at the Pediatric Hemato-Oncology Department, Souraski medical center, Israel. Information regarding patients included: age, white blood cell (WBC) count on presentation, B/T subtype, central nervous system (CNS) involvement and the presence of ETV6-RUNX1 translocation (Table 1). For each patient, we collected 1-4 Geimsa stained smears from the diagnostic BM, and scanned them at 0.25 micron/pixel, 10× magnification.

The patients were split to train (80%) and test (20%) sets, and the images were automatically segmented to remove the background. Each slide was split to 256x256 non-overlapping tiles, resulting in overall 138,000 tile images containing cells (Figure 1). Importantly, we skipped the common step of cell segmentation. This end-to-end approach allows the computational model to exploit the entire smear information, including the morphology of all cells and the global diversity and arrangement of the cells in the smear. Using only the training cases, a deep CNN was trained to classify each Giemsa tile to control, ALL, and AML. The model was further trained to classify ALL tiles to either B or T subtype and to predict the presence of ETV6-RUNX1 translocation in B-ALL. The model was also trained to stratify patients to initial low risk (LR) or high risk (HR) groups, where LR is defined as age<10, WBC<50K, CNS=1 or 2, and B-ALL. The model was then applied to the held-out test set, and tile scores were aggregated to produce per-slide and per-patient prediction scores. The AUC for each prediction task was calculated. To obtain robust statistical analysis, the entire process was repeated 5 times, such that in each time a different test group was selected, and the AUC results were averaged.

Results: The AUC for classification of ALL versus control and ALL versus AML were above 0.99 at the patient level (Table 1), demonstrating the CNN's ability to distinguish ALL cases from the rest almost perfectly. The AUC for classification of B/T subtypes, initial HR/LR subroups and ETV6-RUNX1 translocation were 0.79, 0.68 and 0.68, respectively, at a patient level, all having significant p-values. Notably, the system managed to make its predictions using slides scanned at a magnification significantly lower than routine magnification used for diagnosis by clinicians.

Conclusions: These results show, for the first time, that machine learning computational models can not only reliably identify pediatric ALL, but can also predict the initial risk group, B/T subtype, and presence of ETV6-RUNX1 translocation from Giemsa stained BM samples - features that cannot be concluded from morphological analysis by a human observer. Utilizing such a system may enable physicians in countries that lack immunophenotyping and molecular analysis capabilities to refine ALL diagnosis and initial risk stratification based on Giemsa-stained BM smears alone.

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution